Achieving Fault Detection and Performance on CMPs

نویسندگان

  • Gordon B. Bell
  • Mikko H. Lipasti
چکیده

Technology scaling in integrated circuits has consistently provided dramatic performance improvements in modern microprocessors. However, increasing device counts and decreasing on-chip voltage levels have made transient errors a first-order design constraint that can no longer be ignored. Several proposals have provided fault detection and tolerance through redundantly executing a program on an additional hardware thread or core. While such techniques can provide high fault coverage, they at best provide equivalent performance to the original execution and at worst incur a slowdown due to error checking, contention for shared resources, and synchronization overheads. This work achieves a similar goal of detecting transient errors by redundantly executing a program on additional processor cores; however, by sacrificing a small degree of fault coverage, it speeds up (rather than slows down) program execution compared to the unprotected baseline case. This scheme exploits the fact that a small number of instructions are detrimental to overall performance, and selectively skipping them enables some cores to advance far ahead of others and obtain prefetching and large instruction window benefits. We highlight the incremental hardware required and show that reducing fault coverage from 100% to an average of 96%/81% can result in a speedup of 15%/150% for a collection of integer/floating point benchmarks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance-asymmetry-aware scheduling for Chip Multiprocessors with static core coupling

Thread-level redundancy is an efficient approach for transient fault detection and recovery in Chip Multiprocessors (CMPs), in which two adjacent cores are statically coupled to form a functional Dual Modular Redundancy (DMR). Manufacturing process variations cause core-to-core (C2C) performance asymmetry across the chip, which can be further divided into the asymmetry among core-pairs and the ...

متن کامل

Integrated Fault-detection and Control of DC Microgrids Using SDRE Observer-controller

In this paper, using the state-dependent Riccati equation (SDRE) technique, a suboptimal fault-tolerant control scheme is designed for a DC microgrid in the islanded mode. The objectives are the voltages control of the photo-voltaic cell, the battery, the capacitor bank, and the DC bus as well as on time fault detection. In the design procedure of the SDRE observer-controller, a nonlinear mathe...

متن کامل

FDMG: Fault detection method by using genetic algorithm in clustered wireless sensor networks

Wireless sensor networks (WSNs) consist of a large number of sensor nodes which are capable of sensing different environmental phenomena and sending the collected data to the base station or Sink. Since sensor nodes are made of cheap components and are deployed in remote and uncontrolled environments, they are prone to failure; thus, maintaining a network with its proper functions even when und...

متن کامل

Examining Current Commodity CMPs for Fault Isolation

chip level. In turn, this enables cost benefits from reduced component count. Additionally, enhanced resource sharing leads to better performance. On-chip components can now be easily shared to improve resource utilization, such as core sharing via hyperthreading, shared caches, and I/O interfaces. However, the same features of multicore processors that offer benefits can also present drawbacks...

متن کامل

Power Efficient Redundant Execution for Chip Multiprocessors

This paper describes the design of a power efficient microarchitecture for transient fault detection in chip multiprocessors (CMPs) We introduce a new per-core dynamic voltage and frequency scaling (DVFS) algorithm for our architecture that significantly reduces power dissipation for redundant execution with a minimal performance overhead. Using cycle accurate simulation combined with a simple ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004